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Animals and humans make decisions based on their expected outcomes. Since relevant 
outcomes are often delayed, perceiving delays and choosing between earlier vs. later 
rewards (intertemporal decision-making) is an essential component of animal behavior. 
The myriad observations made in experiments studying intertemporal decision-making and 
time perception have not yet been rationalized within a single theory. Here we present a 
theory — Training-Integrated Maximized Estimation of Reinforcement Rate (TIMERR) — that 
explains a wide variety of behavioral observations made in intertemporal decision-making 
and the perception of time. Our theory postulates that animals make intertemporal choices 
to optimize expected reward rates over a limited temporal window which includes a past 
integration interval — over which experienced reward rate is estimated — as well as the 
expected delay to future reward. Using this theory, we derive mathematical expressions 
for both the subjective value of a delayed reward and the subjective representation of 
the delay. A unique contribution of our work is in finding that the past integration interval 
directly determines the steepness of temporal discounting and the non-linearity of time 
perception. In so doing, our theory provides a single framework to understand both 
intertemporal decision-making and time perception. 



Keywords: decision-making, discounting, intertemporal choice tlieory, time perception, impulsivity, scalar timing 



INTRODUCTION 

Survival and reproductive success depends on beneficial decision- 
making. Such decisions are guided by judgments regarding 
outcomes, which are represented as expected reinforcement 
amounts. As actual reinforcements are often available only 
after a delay, measuring delays and attributing values to rein- 
forcements that incorporate the cost of time is an essential 
component of animal behavior (Stephens and Krebs, 1986; 
Stephens, 2008). Yet, how animals perceive time and assess the 
worth of delayed outcomes — the quintessence of intertemporal 
decision-making — though fundamental, remains to be satisfac- 
torily answered (Frederick et al., 2002; Kalenscher and Pennartz, 
2008; Stephens, 2008). Rationalizing both the perception of time 
and the valuation of outcomes delayed in time in a unified 
framework would significantly improve our understanding of 
basic animal behavior, with wide-ranging applications in fields 
such as economics, ecology, psychology, cognitive disease, and 
neuroscience. 

In the past, many theories including Optimal Foraging Theory 
(Stephens and Krebs, 1986; Stephens, 2008) (OFT), Discounted 
Utility Theory (Samuelson, 1937; Frederick et al, 2002; 
Kalenscher and Pennartz, 2008) (DUT), Ecological Rationality 
Theory (Bateson and Kacelnik, 1996; Stephens and Anderson, 
2001; Stephens, 2008) (ERT), as well as other psychological mod- 
els (Frederick et al, 2002; Kalenscher and Pennartz, 2008; Peters 
and Btichel, 2011; Van den Bos and McClure, 2013) have been 
proposed as solutions to the question of intertemporal choice. 
Of these, OFT, DUT, and ERT attempt to understand ultimate 



causes of behavior through general optimization criteria, whereas 
psychological models attempt to understand its proximate bio- 
logical implementation. The algorithms specified by these prior 
theories and models for intertemporal decision-making are all 
defined by their temporal discounting function — the ratio of 
subjective value of a delayed reward to the subjective value 
of the reward when presented immediately. These algorithms 
come in two major forms: hyperbolic (and hyperbolic-like) dis- 
counting functions (e.g., OFT and ERT) (Stephens and Krebs, 
1986; Frederick et al, 2002; Kalenscher and Pennartz, 2008; 
Stephens, 2008), and exponential (and exponential-like, e.g., P-S 
Frederick et al., 2002; Peters and Biichel, 2011; Van den Bos and 
McClure, 2013) discounting functions (e.g., DUT) (Samuelson, 
1937; Frederick et al., 2002; Kalenscher and Pennartz, 2008). 
Hyperbolic discounting functions have been widely considered 
to be better fits to behavioral data than exponential functions 
(Frederick et al., 2002; Kalenscher and Pennartz, 2008). 

None of these theories and models can systematically explain 
the breadth of data on intertemporal decision-making; we argue 
that the inability of prior theories to rationalize behavior stems 
from the lack of biologically-realistic constraints on general opti- 
mization criteria (see next section). Further, while intertem- 
poral decision-making necessarily requires perception of time, 
theories of intertemporal decision-making and time perception 
(Gibbon et al., 1997; Lejeune and Wearden, 2006) are largely 
independent and do not attempt to rationalize both within a 
single framework. The motivation for our present work was 
to create a biologically-realistic and parsimonious theory of 
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intertemporal decision-making and time perception which pro- 
poses an algorithmically-simple decision-making process to (1) 
maximize fitness and (2) to explain the diversity of behavioral 
observations made in intertemporal decision-making and time 
perception. 

PROBLEMS WITH CURRENT THEORIES AND MODELS 

Intertemporal choice behavior has been modeled using two dis- 
similar approaches. The first approach is to develop theories 
that explore ultimate (Alcock and Sherman, 1994) causes of 
behavior through general optimization criteria (Samuelson, 1937; 
Stephens and Krebs, 1986; Bateson and Kacelnik, 1996; Stephens 
and Anderson, 2001; Frederick et al, 2002; Stephens, 2008). 
In ecology, there are two dominant theories of intertemporal 
choice, OFT and ERT. The statement of OFT posits that the 
choice behavior of animals should result from a global maximiza- 
tion of a "fitness currency" representing long-term future reward 
rate (Stephens and Krebs, 1986; Stephens, 2008). However, how 
animals could in principle achieve this goal is unclear, as they 
face at least two constraints: (1) they cannot know the future 
beyond the currently presented options, and (2) they have lim- 
ited computational/memory capacity. Owing to these constraints, 
prior algorithmic implementations of OFT assume that the cur- 
rent trial structure repeats ad-infinitum. Therefore, maximizing 
reward rates over the indefinite future can be re-written as max- 
imizing reward rates over an effective trial (including all delays 
in the trial) (Stephens and Krebs, 1986; Bateson and Kacelnik, 
1996; Stephens and Anderson, 2001; Stephens, 2008). Thus, OFT 
predicts a hyperbolic discounting function. ERT, on the other 
hand, states that it is sufficient to maximize reward rates only 
over the delay to the reward in the choice under consideration, 
(i.e., locally) to attain ecological success (Bateson and Kacelnik, 
1996; Stephens and Anderson, 2001; Stephens, 2008), also pre- 
dicting a hyperbolic discounting function. In economics, DUT 
(Samuelson, 1937; Frederick et al, 2002) posits that animals max- 
imize long-term exponentially-discounted future utility so as to 
maintain temporal consistency of choice behavior (Samuelson, 
1937; Frederick et al, 2002). 

The second approach, mainly undertaken by psychologists 
and behavioral analysts, is to understand the proximate (Alcock 
and Sherman, 1994) origins of choices by modeling behavior 
using empirical fits to data collected from standard laboratory 
tasks (Kalenscher and Pennartz, 2008). An overwhelming num- 
ber of these behavioral experiments, however, contradict the 
above theoretical models. Specifically, animals exhibit hyperbolic 
discounting functions, inconsistent with DUT (Frederick et al, 
2002; Kalenscher and Pennartz, 2008; Stephens, 2008; Pearson 
et al., 2010), and violate the postulate of global reward rate 
maximization, inconsistent with OFT (Stephens and Anderson, 
2001; Kalenscher and Pennartz, 2008; Stephens, 2008; Pearson 
et al., 2010). Further, there are a wide variety of observa- 
tions like (1) the variability of discounting steepness within and 
across individuals (Frederick et al., 2002; Schweighofer et al, 
2006; Luhmann et al, 2008), and many "anomalous" behav- 
iors including (2) "Magnitude Effect" (Frederick et al, 2002; 
Kalenscher and Pennartz, 2008) (the steepness of discount- 
ing becomes lower as the magnitude of the reward increases). 



(3) "Sign Effect"(Frederick et al., 2002; Kalenscher and Pennartz, 
2008) (gains are discounted more steeply than losses), and (4) 
differential treatment of punishments (Loewenstein and Prelec, 
1992; Frederick et al, 2002; Kalenscher and Pennartz, 2008), that 
are not explained by ERT (nor OFT and DUT). It must also be 
noted that none of the above theories are capable of explaining 
how animals measure delays to rewards, nor do prior theories of 
time perception (Gibbon et al., 1997; Lejeune and Wearden, 2006) 
attempt to explain intertemporal choice. Though psychology and 
behavioral sciences attempt to rationalize the above observa- 
tions by constructing proximate models invoking phenomena like 
attention, memory, and mood (Frederick et al, 2002; Kalenscher 
and Pennartz, 2008; Van den Bos and McClure, 2013), ultimate 
causes are rarely proposed. As a consequence, these models of 
animal behavior are less parsimonious, and often ad-hoc. 

In order to explain behavior, an ultimate theory must consider 
appropriate proximate constraints. The lack of appropriate con- 
straints might explain the inability of the above theories to 
rationalize experimental data. By merely stating that animals 
maximize indefinitely-long-term future reward rates or dis- 
counted utility, the optimization criteria of OFT and DUT 
requires animals to consider the effect of all possible future 
reward-options when making the current choice (Stephens and 
Krebs, 1986; Kalenscher and Pennartz, 2008). However, such a 
solution would be biologically implausible for at least three rea- 
sons: ( 1 ) animals cannot know all the rewards obtainable in the 
future; (2) even if animals knew the disposition of all possible 
future rewards, the combinatorial explosion of such a calculation 
would present it with an untenable computation (e.g., in order to 
be optimal when performing even 100 sequential binary choices, 
an animal will have to consider each of the 2'°" combinations); 
(3) animals cannot persist for indefinitely long intervals without 
food in the hope of obtaining an unusually large reward in the dis- 
tant future, even if the reward may provide the highest long-term 
reward rate (e.g., option between 11,000 units of reward in 100 
days vs. 10 units of reward in 0.1 day). On the other hand, ERT, 
although computationally-simple, expects an animal to ignore its 
past reward experience while making the current choice. 

To contend with uncertainties regarding the future, an ani- 
mal could estimate reward rates based on an expectation of the 
environment derived from its past experience. In a world that 
presents large fluctuations in reinforcement statistics over time, 
estimating reinforcement rate using the immediate past has an 
advantage over using longer-term estimations because the corre- 
lation between the immediate past and the immediate future is 
likely high. Hence, our TIMERR theory proposes an algorithm 
for intertemporal choice that aims to maximize expected reward 
rate based on, and constrained by, memory of past reinforcement 
experience. As a consequence, it postulates that time is subjec- 
tively represented such that subjective representation of reward 
rate accurately reflects objective changes in reward rate (see sec- 
tion TIMERR Theory: Time Perception). In doing so, we are 
capable of explaining a wide variety of fundamental observations 
made in intertemporal decision-making and time perception. 
These include hyperbolic discounting (Stephens and Krebs, 1986; 
Stephens and Anderson, 2001; Frederick et al., 2002; Kalenscher 
and Pennartz, 2008), "Magnitude" (Myerson and Green, 1995; 
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Frederick et al, 2002; Kalenscher and Pennartz, 2008) and "Sign" 
effects (Frederick et al., 2002; Kalenscher and Pennartz, 2008), 
differential treatment of losses (Frederick et al., 2002; Kalenscher 
and Pennartz, 2008), scaling of timing errors with interval dura- 
tion (Gibbon, 1977; Gibbon et al, 1997; Matell and Meek, 2000; 
Buhusi and Meek, 2005; Lejeune and Wearden, 2006), and, obser- 
vations that impulsive subjects (as defined by abnormally steep 
discounting) under-produce (Wittmann and Paulus, 2008) time 
intervals and show larger timing errors (Wittmann et al., 2007; 
Wittmann and Paulus, 2008) (see "Summary" for a full list). It 
thereby recasts the above-mentioned "anomalies" not as flaws, 
but as features of reward-rate optimization under experiential 
constraints. 

MOTIVATION BEHIND THE TIMERR ALGORITHM 

To illustrate the motivation and reasoning behind our theory, 
we consider a simple behavioral task. In this task, an animal 
must make decisions on every trial between two randomly cho- 
sen (among a finite number of possible alternatives) known 
reinforcement-options. Having chosen an option on one trial, 
the animal is required to wait the corresponding delay to obtain 
the reward amount chosen. An example environment with three 
possible reinforcement-options is shown in Figure lA. We assert 
that the goal of the animal is to gather the maximum total 
reward over a fixed amount of time, or equivalently, to attain 
the maximum total (global) reward rate over a fixed number of 
trials. 

Assuming a stationary reinforcement-environment in which it 
is not possible to directly know the pattern of future reinforce- 
ments, an animal may yet use its past reinforcement experience 
to instruct its current choice. Provisionally, suppose also that an 
animal can store its entire reinforcement-history in the task in its 
memory. So rather than maximizing reward rates into the future 
as envisioned by OFT, the animal can then maximize the total 
reward rate that would be achieved so far (at the end of the cur- 
rent trial). In other words, the animal could pick the option that 
when chosen, would lead to the highest global reward rate over all 
trials until, and including, the current trial, i.e.. 

Pick option with the highest value for ( 1 ) 

T ~\~ tj 

where T is the total time elapsed in the session so far, _R is the total 
reward accumulated so far and (r,-, f,) is the reward magnitude 
and delay, respectively, for the various reinforcement-options on 
the current trial. This ordered pair notation will be followed 
throughout the paper. 

Under the above conditions, this algorithm yields the highest 
possible reward rate achievable at the end of any given num- 
ber of trials. In contrast, previous algorithms for intertemporal 
decision-making (hyperbolic discounting, exponential discount- 
ing, two-parameter discounting), while being successful at fitting 
behavioral data, fail to maximize global reward rates. For the 
example reinforcement-environment shown in Figure lA, sim- 
ulations show that the algorithm in Equation (1) outperforms 
other extant algorithms by more than an order of magnitude 
(Figure IB). 



The reason why extant alternatives fare poorly is that they 
do not account for opportunity cost, i.e., the cost incurred in 
the lost opportunity to obtain better rewards than currently 
available. In the example considered, two of the reinforcement- 
options are significantly worse than the third (Figure IC). Hence, 
in a choice between these two options, it is even worth incur- 
ring a small punishment ($—0.01) at a short delay for sooner 
opportunities of obtaining the best reward ($5) (Figure IC). 
Previous models, however, pick the reward ($0.1) in favor 
of the punishment since they do not have an estimate of 
opportunity cost. In contrast, by storing the reinforcement 
history. Equation (1) accounts for the opportunity cost, and 
picks the punishment. Recent experimental evidence suggests 
that humans indeed accept small temporary costs in order to 
increase the opportunity for obtaining larger gains (KoUing et al, 
2012). 

The behavioral task shown in Figure lA is similar to stan- 
dard laboratory tasks studying intertemporal decisions (Frederick 
et al., 2002; Schweighofer et al, 2006; Kalenscher and Pennartz, 
2008; Stephens, 2008). However, in naturalistic settings, ani- 
mals commonly have the ability to forgo any presented option. 
Further, the number of options presented on a given trial can 
vary and could arise from a large pool of possible options. An 
illustration of such a task is displayed in Figure ID, showing 
the outcomes of five past decisions. Decision 2 illustrates an 
instance of incurring an opportunity cost. Decision 3 shows the 
presentation of a single option that was forgone, leading to the 
presentation of a better option in decision 4. Though the options 
presented in decision 5 are those in decision 1, the animal's 
choice behavior is the opposite, as a result of changing esti- 
mations of opportunity cost. Results of performance in such a 
simulated task (with no punishments) are shown in Figure IE, 
again showing Equation (1) outperforming other models (see 
Methods). 

TIMERR THEORY: INTERTEMPORAL CHOICE 

It is important to note that while the extent to which Equation 
(1) outperforms other models depends on the reinforcement- 
environment under consideration, its performance in a stationary 
environment will be greater than or equal to previous decision 
models. However, biological systems face at least three major 
constraints that limit the appropriateness of Equation (1): (1) 
their reinforcement-environments are non-stationary; (2) inte- 
grating reinforcement-history over arbitrarily long intervals is 
computationally implausible, and, (3) indefinitely long intervals 
without reward cannot be sustained by an animal (while main- 
taining fitness) even if they were to return the highest long-term 
reward rate (e.g., choice between 100,000 units of food in 100 
days vs. 10 units of food in 0.1 day). Hence, in order to be 
biologically-realistic, TIMERR theory states that the interval over 
which reinforcement-history is evaluated, the past-integration- 
interval (Time; ime stands for in my experience), is finite. Thus, 
the TIMERR algorithm states that animals maximize reward 
rates over an interval including Time and the learned expected 
delay to reward (f) [Equation (2), Figures 2A,B]- This modi- 
fication renders the decision algorithm shown in Equation (1) 
biologically-plausible. 



Frontiers in Behavioral Neuroscience 



www.frontlersin.org 



February 2014 | Volume 8 | Article 61 | 3 



Namboodiri et al. 



Theory: temporal decision-making and perception 



Choose between 
two randomly 
presented options 



Wait until (and 
consume) reinforce- 
ment of chosen 
option 



Rewards 
obtained 



Possible reinforcement 
options 
($, s) 

1. (5,1) 

2. (0.1, 100) 

3. (-0.01,1) 



Pi. 



0.5 





• 


Equation 1 




• 


Hyperbolic 






Exponential 


• 


p-6 




C Which one should be chosen? 

S -0.01 

Is 



Reward rate 
obtained so far - 
1.5 S/s 



100s 



$0. 



The importance of opportunity cost 

- -ss 



Past Future 



Reward rate 
obtained so far ^ 
1.5 $/s 



0.7 
0.6 
0.5 
0.4 
0.3 
0.2 
0.1 
0 



" V" 

Expected reward 
rate ~ 1 .5 S/s 

— ss- 



1 nos 



s 0.1 



• Equation 1 
«> Hyperbolic 

Exponential 

• 3-5 



FIGURE 1 I A schematic illustrating the problem of intertemporal 
decision-mal<ing and the rationale for our solution. (A) Flow chart of 
a simple behavioral task, showing the possible reinforcement options. 
(B) The performance of four decision-making agents using the four 
decision processes as shown in the legend (see Methods). The 
parameters of the three previous models were tuned to attain 
maximum performance. The error bar shows standard deviation. Since 
the decision rules of these models operate only on the current trial, 
the corresponding performances have no variability and hence, their 
standard deviations are zero. (C) Illustration of the reason for 
performance failure, showing a choice between the two worst options. 
The reward rate so far is much higher than the reward rates provided 
by the two options under consideration. Since these models do not 
include a metric of opportunity cost, they pick ($0.1, 100 s). However, 
on an average, choosing ($-0.01, Is) will provide a larger reward at the 
end of 100 s. (D) A schematic illustrating a more natural behavioral 



task, with choices involving one or two options chosen from a total of 
four known reinforcement-options. The choices made by the animal are 
indicated by the bold line and are numbered 1-5. Here, we assume 
that during the wait to a chosen reinforcement-option, other 
reinforcement-options are not available (see Expected Reward Rate Gain 
during the Wait in Appendix for an extension). Reinforcement-options 
connected by dotted lines are unknown to the animal either because 
they are in the future, or because of the choices made by the animal 
in the past. For instance, deciding to pursue the brown option in the 
second choice causes the animal to lose a large reward, the presence 
of which was unknown at the moment of decision. (E) Performance of 
the models in an example environment as shown in (D) (see Methods, 
for details). Error bars for the previous models are not visible at this 
scale. For the environment chosen here, a hyperbolic model (mean 
reward rate = 0.0465) is slightly worse than exponential and p-S 
models (mean reward rate = 0.0490). 



If the estimated average reward rate over the past integration 
window of Ti^^. is denoted hy flest) the TIMERR algorithm can be 
written as: 

Pick option with the highest value for "^"^ ™^ — '_ (2) 

Therefore, the TIMERR algorithm acts as a temporally- 
constrained, experience-based, solution to the optimization 
problem of maximizing reward rate. It is thus a better imple- 
mentation of the statement of OFT than prior implementa- 
tions. It requires that only experienced magnitudes and times 
of the rewards following conditioned stimuli are stored, there- 
fore predicting that intertemporal decisions of animals will 
not incorporate post-reward delays due to limitations in asso- 
ciative learning (Kacelnik and Bateson, 1996; Stephens and 
Anderson, 2001; Pearson et al, 2010; Blanchard et al, 2013) 



consistent with prior experimental evidence showing the insen- 
sitivity of choice behavior to post-reward delays (Stephens and 
Anderson, 2001; Kalenscher and Pennartz, 2008; Stephens, 2008; 
Pearson et al, 2010; Blanchard et al, 2013) (see Animals do 
not Maximize Long-Term Reward Rates in Appendix for a 
detailed discussion). It is important to note, however, that 
indirect effects of post-reward delays on behavior (Blanchard 
et al, 2013) can be explained as resulting from the implicit 
effect of post-reward delays on past reward rate; the higher the 
post-reward delays become, the lower will be the past reward 
rate. 

From the TIMERR algorithm, it is possible to derive the sub- 
jective value of a delayed reward (Figure 2C) — defined as the 
amount of immediate reward that is subjectively equivalent to the 
delayed reward. 

This is calculated by asserting that reward rate for iSV(r, t), 
0) = reward rate for (r, f) 
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FIGURE 2 I Solution to the problem of intertemporal choice as 
proposed by TIMERR theory. (A) Past reward rate is estimated (aost) 
by tlie animal over a time-scale of Time [Calculation of the Estimate of 
Past Reward Rate (Sest) in Appendix]. This estimate is used to evaluate 
whether the expected reward rates upon picking either current option is 
worth the opportunity cost of waiting. (B) The decision algorithm of 
TIMERR theory shows that the option with the highest expected reward 
rate is picked Equation (2), so long as this reward rate is higher than 



the past reward rate estimate (aest). Such an algorithm automatically 
includes the opportunity cost of waiting in the decision. (C) The 
subjective values for the two reward options shown in (A) (time-axis 
scaled for illustration) as derived from the decision algorithm Equation 
(3) are plotted. In this illustration, the animal picks the green option. It 
should be noted that even if the orange option were to be presented 
alone, the animal would forgo this option since its subjective value is 
less than zero. Zero subjective value corresponds to ERR = aest- 



+ 



SV(r, t) 



+ 



1 + 



1 + 



where SV(r, t) is the subjective value of reward r delayed by time 
t. Simplifying, the expression for SV(r, t) is given by 



SV (r, f) = 



flestf 



1 + 



(3) 



where aest is an estimate of the average reward rate in the past over 
the integration window Time with the reward option specified by 
a magnitude r and a delay t. 

Equation (3) presents an alternative interpretation of the algo- 
rithm: the animal is estimating the net worth of pursuing each 
delayed reward by subtracting the opportunity cost incurred by 
forfeiting potential alternative reward options during the delay 
to a given reward and normalizing by the explicit temporal cost 
of waiting. This is because the numerator in Equation (3) rep- 
resents the expected reward gain but subtracts this opportunity 
cost, flestf) which corresponds to a baseline expected amount of 
reward that might be acquired over t. The denominator is the 
explicit temporal cost of waiting. 



THE TEMPORAL DISCOUNTING FUNCTION 

The temporal discounting function — the ratio of subjective 
value to the subjective value of the reward when presented 
immediately — is given by [based on Equation (3)] 



D(r, t) 



SV(r, t) 



1 + 



(4) 



This discounting function is hyperbolic with an additional, 
dynamical (changing with flest) subtractive term. The effects 
of varying the parameters, viz. the past integration interval 
(Time); estimated average reward rate (flest) and reward magni- 
tude (r), on the discounting function are shown in Figure 3. 
The steepness of this discounting function is directly gov- 
erned by Time, the past integration interval (Figure 3A). In 
other words, the longer one integrates over the past to esti- 
mate reinforcement history, the higher the tolerance to delays 
when considering future rewards, thus rationalizing abnor- 
mally steep discounting (characteristic of impulsivity) as result- 
ing from abnormally low values of Time. As opportunity costs 
(ciest) increase, delayed rewards are discounted more steeply 
(Figure 3B). Also, as the magnitude of the reward increases 
(Figure 3C), the steepness of discounting becomes lower, referred 
to as the "Magnitude Effect" (Myerson and Green, 1995; Frederick 
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FIGURE 3 I The dependence of the discounting function on its 
parameters Equation (4). (A) Explicit temporal cost of waiting: As the past 
integration interval (T^^ie) increases, the discounting function becomes less 
steep, i.e., the subjective value for a given delayed reward becomes higher 
(aest = 0 and r = 20). (B) Opportunity cost affects discounting.' As aest 
increases, the opportunity cost of pursuing a delayed reward increases and 
hence, the discounting function becomes steeper. The dotted line indicates 
a subjective value of zero, below which rewards are not pursued, as is the 
case when the delay is too high, (r = 20 and Time = 100). (C) " Magnitude 
Effect": As the reward magnitude increases, the steepness of discounting 
decreases (Myerson and Green, 1995; Frederick et al., 2002; Kalenscher 



and Pennartz, 2008) (Tjn- 



100 and 3est = 0.05). (D) "Sign Effect" and 



differential treatment of losses: Gains (green and brown) are discounted 
steeper than losses (cyan and orange) of equal magnitudes (Frederick et al., 



2002; Kalenscher and Pennartz, 2008) (Tj^- 



100 and a„. 



: 0.05). Note 



that as the magnitude of loss decreases, so does the steepness of 
discounting (Figure 4). In fact, for losses with magnitudes lower than SestT, 
the discounting function will be greater than 1 , leading to a differential 
treatment of losses (Frederick et al., 2002; Kalenscher and Pennartz, 2008) 
(see text. Figure 4). 



et al., 2002; Kalenscher and Pennartz, 2008) in prior exper- 
iments. Further, it is shown that gains are discounted more 
steeply than losses of equal magnitudes in net positive environ- 
ments (Figure 3D), as shown previously and referred to as the 
"Sign Effect" (Frederick et al., 2002; Kalenscher and Pennartz, 
2008). It must also be pointed out that the discounting func- 
tion for a loss becomes steeper as the magnitude of the loss 
increases, observed previously as the reversal of the "Magnitude 
Effect" for losses (Hardisty et al., 2012) (Figure 4A). In fact, 
when forced to pick a punishment in a net positive environ- 
ment, low- magnitude (below agst x Time) losses will be preferred 
immediately while higher-magnitude losses will be preferred 
when delayed (Figure 4B), as has been experimentally observed 
(Frederick et al., 2002; Kalenscher and Pennartz, 2008; Hardisty 
et al., 2012) (for a full treatment of the effects of changes 
in variables, see Consequences of the Discounting Function in 
Appendix). 




0 200 400 

Time until reward (t) 




200 400 
Time until reward (t^ 

FIGURE 4 I "Magnitude Effect" and Differential treatment of losses in a 
net positive environment. (A) The discounting function plotted for losses 
of various magnitudes (as shown in Figure 3D; Best = 0.05 and Time = 100). 
As the magnitude of a loss increases, the discounting function becomes 
steeper. However, the slope of the discounting steepness with respect to 
the magnitude is minimal for large magnitudes (100 and 1000; see 
Consequences of the Discounting Function in Appendix). At magnitudes 
below SestTimo. fhe discounting function becomes an increasing function of 
delay. (B) Plot of the signed discounting function for the magnitudes as 
shown in (A), showing that for magnitudes lower than aest Tjme. a loss 
becomes even more of a loss when delayed. Hence, at low magnitudes 
(< aest Time), losses are preferred immediately. No curve crosses the dotted 
line at zero, showing that at all delays, losses remain punishing. 



TIMERR THEORY: TIME PERCEPTION 

Attributing values to rewards delayed in time necessitates rep- 
resentations of those temporal delays. These representations of 
time are subjective, as it is known that time perception varies 
within and across individuals (Gibbon et al., 1997; Matell and 
Meek, 2000; Buhusi and Meek, 2005; Lejeune and Wearden, 2006; 
Wittmann and Paulus, 2008), and that errors in representation of 
time increase with the interval being represented (Gibbon et al, 
1997; Matell and Meek, 2000; Buhusi and Meek, 2005; Lejeune 
and Wearden, 2006). While there are many models that address 
how timing may be implemented in the brain (Gibbon, 1977; 
Killeen and Fetterman, 1988; Matell and Meek, 2000; Buhusi and 
Meek, 2005; Simen et al., 2011a,b), our aim in this section is to 
present an "ultimate" theory of time perception, i.e., a theory of 
the principles behind time perception. 

Since TIMERR theory states that animals seek to maximize 
expected reward rates, we posit that time is represented sub- 
jectively (Figure 5A) so as to result in accurate representations 
of changes in expected reward rate. In other words, subjective 
time is represented so that subjective reward rate (subjective 
value/ subjective time) equals the true expected reward rate less 
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FIGURE 5 I Subjective time mapping and simulations of performance 
in a time reproduction task. (A) A schematic of the representation of the 
reward-environment by two animals with different values of 7]^e- Lower 
values of T^me generate steeper discounting (higher impulsivity), and hence, 
smaller subjective values. (B) Subjective time mapping: The subjective time 
mapping as expressed in Equation (6) is plotted for the two animals in (A). 
Subjective time representation saturates at Time for longer intervals. This 
saturation effect is more pronounced in the case of higher impulsivity, 
thereby leading to a reduced ability to discriminate between intervals (here, 
40 and 50 s). (C) Bias in time reproduction: A plot of reproduced median 
intervals for a case of high impulsivity in a simulated time reproduction task 
as generated by the simple accumulator model (see Methods; Figure 6) for 
sample intervals ranging between 1 and 90s. At longer intervals, there is an 
increasing underproduction. The dashed line indicates perfect reproduction. 
(D) The bias in timing (difference between reproduced interval and sample 
interval) a 90 s sample interval is shown for different values of Time, 
demonstrating that as impulsivity reduces, so does underproduction. 



the baseline expected reward rate (flest)- Hence, if the subjective 
representation of time associated with a delay f is denoted by 

sr(f), 

Combining Equation (5) with Equation (3), we get 

sr(o= , .\ (6) 



1 + 



f 



decision-making. It can be seen that the difference in subjective 
time representations between 40 and 50 s is smaller for a lower 
Time (high impulsivity). Hence, higher impulsivity corresponds to 
a reduction in the ability to discriminate between long intervals (a 
decrease in the precision of time representation) (Figures 5A,B). 

Internal time representation has been previously modeled 
using accumulator models (Buhusi and Meek, 2005; Simen et al., 
2011a,b) that incorporate the underlying noisiness in informa- 
tion processing. We used a simple noisy accumulator model (see 
Methods, Figure 6A) that represents subjective time according 
to Equation (6) to simulate a time interval reproduction task 
(Buhusi and Meek, 2005; Lejeune and Wearden, 2006). In this 
model, we assumed that the noise in the slope of the accu- 
mulator was proportional to the square root of the signal and 
that there is a constant read-out noise (see Methods for details). 
Such noise in the accumulator slope (i.e., proportional to the 
square root of the signal) occurs in spiking neuronal models that 
assume Poisson statistics, having been used in prior accumulator 
models (Simen et al., 2011b). The results of time interval repro- 
duction simulations (see Methods) are shown in Figures 5C,D. 
Lower values of Time correspond to an underproduction of time 
intervals (i.e., decreased accuracy of reproduction), with the mag- 
nitude of underproduction increasing with increasing durations 
of the sample interval (Figure 5C). When attempting to repro- 
duce a 90 s sample interval, the magnitude of underproduction 
decreases with increases in Timo C)r equivalently, with decreas- 
ing impulsivity (Figure 5D). These predictions are supported by 
prior experimental evidence (Wittmann and Paulus, 2008). 

ERRORS IN TIME PERCEPTION 

Prior studies have observed that the error in representation of 
intervals increases with their durations (Gibbon et al., 1997; 
Matell and Meek, 2000; Buhusi and Meek, 2005; Lejeune and 
Wearden, 2006). Such an observation is consistent with the 
subjective time representation presented here (Figures 5A,B)- 
TIMERR theory predicts that the representation errors will be 
larger when Time is smaller (higher impulsivity) (Figures 5A,B), 
as observed experimentally (Wittmann et al., 2007; Wittmann 
and Paulus, 2008). Prior studies investigating the relationship 
between time duration and reproduction error have observed a 
linear scaling ("scalar timing") within a limited range (Gibbon 
et al, 1997; Matell and Meek, 2000; Buhusi and Meek, 2005; 
Lejeune and Wearden, 2006). 

Calculating the error in reproduced intervals by the accu- 
mulator model mentioned above cannot be done analytically. 
However, we present an approximate analytical solution below. 
Assuming that the representation of subjective time, ST(t), has a 
constant infinitesimal noise of dST{t) associated with it, the noise 
in representation of a true interval t, denoted as dt wUl obey 



Such a representation has the property of being bounded 
[Sr(oo) = Time]; thereby making it possible to represent very 
long durations within the finite dynamic ranges of neuronal firing 
rates. Plots of the subjective time representation of delays between 
1 and 90 s are shown in Figure 5B for two different values of Time- 
As mentioned previously (Figure 3A), a lower value of Time cor- 
responds to steeper discounting, characteristic of more impulsive 



dST (t) 
dt 



1 + 



-ime / / 



(l + T^) 

\ ^ ime / 



If one assumes that the neural noise in representing ST{t) is lin- 
early related to the signal, with a term proportional to the signal 
in addition to a constant noise [i.e., dST(t) = kST(t) + c], then 
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FIGURE 6 I Noisy accumulator model (see Methods). (A) The subjective 
representation of time, as plotted in Figure 5B, is simulated using a noisy 
accumulator model as described in Methods. The accumulated value is 
stored at the interval being timed (here 90 s), stored in memory, and used 
as a threshold for later time reproduction. The reproduced interval (as in 
Figures 5C,D) is defined by the moment of first threshold-crossing. (B) A 
plot of the scaling of noise in the accumulator with the signal. The y-axis is 
the standard deviation of the accumulated signal at every ST{t) shown in 
the X-axis. The standard deviation was calculated by running the 
accumulator 2000 times. The near-linear relationship seen here is used to 
calculate an approximate analytical solution for the error in the 
representation of subjective time as shown in Equation (8). (C) Plot of the 
coefficient of variation iCy) of reproduced intervals (measurement of 
precision) with respect to the interval being reproduced shows a 
near-constant value over a large range of durations for Time = 300 s. An 
analytical approximation is expressed in Equation (8). Each data point is the 
result of averaging over 2000 trials. 



the corresponding error in real time is 




= k(t+ ^)+c(l + (7) 

\ ^ ime / \ ime / 

The coefficient of variation (error/central tendency) expected 
from such a model is then 

k(t + ^)+c(l + j!-Y 
This can be simplified as 

Cv«.fc(l + -^) + ii^^ (8) 

\ ^ ime / ^ 

In the above expression, c can be thought of as a constant addi- 
tive noise in the memory of subjective representation of time, 
ST{t), whereas the noise proportional to the signal could result 
from fluctuations in the slope of accumulation. In fact, for the 
accumulator mentioned above (that exhibits a square root depen- 
dence of the noise in slope with respect to the signal), the net 
relationship between the noise of the signal and the signal itself, 
is approximately linear (Figure 6B). Hence, our earlier assump- 
tion is a good approximation to the more realistic, yet analytically 
intractable, accumulator model considered above. The results of 
numerical simulations on Cv are shown in Figure 6C, showing a 
near-constant value for a large range of sample durations. 

The above equation results in a U-shaped C,, curve. If the con- 
stant additive noise (c) is small compared to the linear noise, the 
second term will dominate only for very low time intervals. At 
these very low time intervals, this will lead to a decrease in Cy as 
durations increase from zero. At longer intervals, Cv wiU appear 
to be a constant before a linearly increasing range. Importantly, 
the slope of the linear range wiU depend on the value of Time- 
Hence, though the accumulator model considered here predicts 
an increase in Cy at long intervals, it nonetheless will appear con- 
stant within a range determined by Time- For larger values of 
rime) Cv wUl tend toward a constant. For the simulations shown 
in Figure 6C with a Time of 300 s, Cv is near constant over a 
very wide range of durations. While Cv is generally considered 
to be a constant, experimental evidence examining a wide range 
of sample durations analyzed across many studies (Gibbon et al., 
1997; Bizo et al, 2006) accords with the specific prediction of a 
U-shaped coefficient of variation (spread/central tendency) for 
the production times Equation (8). We do note, however, that 
a more realistic model representing neural processing could lead 
to quantitative deviations from the simple approximations pre- 
sented here. Such involved calculations are beyond the scope of 
this work. Nevertheless, the most important falsifiable prediction 
of our theory regarding timing is that the error in time per- 
ception will show quantitative deviations from Weber's law in 
impulsive subjects (with aberrantly low values of Time). It must 
also be emphasized that the above equations only apply within an 
individual subject when Time can be assumed to be a constant, 
independent of the durations being tested. Pooling data across 
different subjects, as is common, would lead to averaging across 
different values of Time) and hence a flattening of the Cv curve. 
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TEMPORAL BISECTION 

Time perception is also studied using temporal bisection exper- 
iments (Allan and Gibbon, 1991; Lejeune and Wearden, 2006; 
Baumann and Odum, 2012) in which subjects categorize a sam- 
ple interval as closer to a short (fjj or a long (f;) reference interval. 
The sample interval at which subjects show maximum uncer- 
tainty in classification as short or long is called the point of 
subjective equality, or, the "bisection point." The bisection point 
is of considerable theoretical interest. If subjects perceived time 
linearly with constant errors, the bisection point would be the 
arithmetic mean of the short and long intervals. On the other 
hand, if subjects perceived time in a scalar or logarithmic fashion 
or used a ratio-rule under linear mappings, it has been pro- 
posed that the bisection point would be at the geometric mean 
(Allan and Gibbon, 1991). However, experiments studying tem- 
poral bisection have produced ambiguous results. Specifically, the 
bisection point has been shown to vary between the geometric 
mean and the arithmetic mean and has sometimes even been 
shown to be below the geometric mean, closer to the harmonic 
mean (Killeen et al, 1997). 

The bisection point as calculated by TIMERR theory is derived 
below. The calculation involves transforming both the short and 
long intervals into subjective time representations and expressing 
the bisection point in subjective time (subjective bisection point) 
as the mean of these two subjective representations. The bisection 
point expressed in real time is then calculated as the inverse of the 
subjective bisection point. 



sr(f,) = 



1 -I- 



ST{ti) 



1 + 7^ 

^ ime 



Therefore, the bisection point in subjective time is given by 



Subjective bisection point (SBP) 



ST (t,) + ST (tl) 
2 
+ 



1+^ 



The value of the bisection point expressed in real time is given by 
the inverse of the subjective bisection point, viz. 



Bisection point in real time : 



SBP 



1 
T 

^ ime 

T 



SBP 



(^) + tstl 



(9) 



From the above expression, it can be seen that the bisection 
point can theoretically vary between the harmonic mean and 
the arithmetic mean as Time varies between zero and infinity, 
respectively. 

Hence, TIMERR theory predicts that when comparing bisec- 
tion points across individuals, individuals with larger values of 
Time will show bisection points closer to the arithmetic mean 
whereas individuals with smaller values of Time wQl show lower 
bisection points, closer to the geometric mean. If Time was smaller 



stUl, the bisection point would be lower than the geometric mean, 
approaching the harmonic mean. This is in accordance with 
the experimental evidence mentioned above showing bisection 
points between the harmonic and arithmetic means (Allan and 
Gibbon, 1991; Killeen et al, 1997; Baumann and Odum, 2012). 
Further, we also predict that the steeper the discounting func- 
tion, the lower the bisection point, as has been experimentally 
confirmed (Baumann and Odum, 2012). Predictions similar to 
ours have been made previously (Balci et al., 201 1) regarding the 
location of the bisection point by assuming variability in tempo- 
ral precision. If one assumes that impulsive subjects show larger 
timing errors, the previous model can also explain a reduction in 
the bisection point for subjects showing steeper discounting func- 
tions. However, it must be pointed out that the key contribution 
of our work is in deriving this result. This relationship is not an 
assumption in our work, but rather is an integral part of its con- 
tribution [see Equation (8) for relationship between impulsivity 
and Cv]. 

SUMMARY: PREDICTIONS OF TIMERR THEORY SUPPORTED 
BY EXPERIMENTS 

All the predictions mentioned below result from Equations (3) 
and (6). 

1. The discounting function will be hyperbolic in form 
(Frederick et al., 2002; Kalenscher and Pennartz, 2008). 

2. The discounting steepness could be labile within and across 
individuals (Loewenstein and Prelec, 1992; Frederick et al., 
2002; Schweighofer et al, 2006; Luhmann et al, 2008; Van 
den Bos and McClure, 2013). 

3. Temporal discounting could be steeper when average delays 
to expected rewards are lower (Frederick et al., 2002; 
Schweighofer et al, 2006; Luhmann et al, 2008) [see Effects 
of Plasticity in the Past Integration Interval (Time)]- 

4. "Magnitude Effect": as reward magnitudes increase in a net 
positive environment, the discounting function becomes less 
steep (Frederick et al., 2002; Kalenscher and Pennartz, 2008) 
(Figure 3C). 

5. "Sign Effect": rewards are discounted steeper than punish- 
ments of equal magnitudes in net positive environments 
(Frederick et al., 2002; Kalenscher and Pennartz, 2008). 

6. The "Sign Effect" will be larger for smaller magnitudes 
(Loewenstein and Prelec, 1992; Frederick et al., 2002) (see 
Consequences of the Discounting Function in Appendix). 

7. "Magnitude Effect" for losses: as the magnitudes of losses 
increase, the discounting becomes steeper. This is in the 
reverse direction as the effect for gains (Hardisty et al, 2012). 
Such an effect is more pronounced for lower magnitudes 
(Hardisty et al., 2012) (see Consequences of the Discounting 
Function in Appendix). 

8. Punishments are treated differently depending upon their 
magnitudes. Higher magnitude punishments are preferred at 
a delay, while lower magnitude punishments are preferred 
immediately (Loewenstein and Prelec, 1992; Frederick et al, 
2002; Kalenscher and Pennartz, 2008) (Figure 4). 

9. "Delay-Speedup" asymmetry: Delaying a reward that you 
have already obtained is more punishing than speeding up 
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the delivery of the same reward from that delay is reward- 
ing. This is because a received reward wiU be included in the 
current estimate of past reward rate (flest) and hence, wiU 
be included in the opportunity cost (Frederick et al., 2002; 
Kalenscher and Pennartz, 2008). 

10. Time perception and temporal discounting are correlated 
(Wittmann and Paulus, 2008). 

1 1 . Timing errors increase with the duration of intervals (Gibbon 
et al, 1997; Matell and Meek, 2000; Buhusi and Meek, 2005; 
Lejeune and Wearden, 2006). 

12. Timing errors increase in such a way that the coefficient of 
variation follows a U-shaped curve (Gibbon et al., 1997; Bizo 
etal, 2006). 

13. Impulsivity (as characterized by abnormally steep tempo- 
ral discounting) leads to abnormally large timing errors 
(Wittmann et al, 2007; Wittmann and Paulus, 2008). 

14. Impulsivity leads to underproduction of time intervals, with 
the magnitude of underproduction increasing with the dura- 
tion of the interval (Wittmann and Paulus, 2008). 

15. The bisection point in temporal bisection experiments will 
be between the harmonic and arithmetic means of the refer- 
ence durations (Allan and Gibbon, 1991; Killeen et al, 1997; 
Baumann and Odum, 2012). 

16. The bisection point need not be constant within and across 
individuals (Baumann and Odum, 2012). 

17. The bisection point wiU be lower for individuals with steeper 
discounting (Baumann and Odum, 2012). 

18. The choice behavior for impulsive individuals wiU be more 
inconsistent than for normal individuals (Evenden, 1999). 
This is because their past reward rate estimates will show 
larger fluctuations due to a lower past integration interval. 

19. Post-reward delays will not be directly included in the 
intertemporal decisions of animals during typical labora- 
tory tasks (Stephens and Anderson, 2001; Kalenscher and 
Pennartz, 2008; Stephens, 2008; Pearson et al, 2010). Variants 
of typical laboratory tasks may, however, lead to the inclu- 
sion of post-reward delays in decisions (Stephens and 
Anderson, 2001; Kalenscher and Pennartz, 2008; Stephens, 
2008; Pearson et al, 2010). Post-reward delays can further 
indirectly affect decisions as they affect the past reward rate 
(Blanchard etal, 2013). 

DISCUSSION 

Our theory provides a simple algorithm for decision-making in 
time. The algorithm of TIMERR theory, in its computational sim- 
plicity, could explain results on intertemporal choice observed 
across the animal kingdom (Stephens and Krebs, 1986; Frederick 
et al., 2002; Kalenscher and Pennartz, 2008), from insects to 
humans. Higher animals, of course, could evaluate subjective 
values with greater sophistication to build better models of the 
world including predictable statistical patterns of the environ- 
ment and estimates of risks involved in waiting (Extensions 
of TIMERR Theory in Appendix). It must also be noted that 
other known variables influencing subjective value like satiety 
(Stephens and Krebs, 1986; Doya, 2008), the non-linear utility 
of reward magnitudes (Stephens and Krebs, 1986; Doya, 2008) 
and the non-linear dependence of health/fitness on reward rates 



(Stephens and Krebs, 1986) have been ignored. Such factors, 
however, can be included as part of an extension of TIMERR the- 
ory while maintaining its inherent computational simplicity. We 
derived a generalized expression of subjective value that includes 
such additional factors Equation (A7), capturing even more vari- 
ability in observed experimental results (Frederick et al., 2002; 
Kalenscher and Pennartz, 2008) (Non-Linearities in Subjective 
Value Estimation to Generalized Expression for Subjective Value 
in Appendix). It must also be noted that while we have ignored 
the effects of variability in either delays or magnitudes, expla- 
nations of such effects have previously been proposed (Gibbon 
et al, 1988; Kacelnik and Bateson, 1996) and are not in con- 
flict with our theory. Also, since the exclusion of post-reward 
delays in decisions in TIMERR theory is borne out of lim- 
itations of associative learning, it allows for the inclusion of 
these delays in tasks where they can be learned. Presumably, 
an explicit cue indicating the end of post-reward delays could 
foster a representation and inclusion of these delays in deci- 
sions. Accordingly, it has been shown in recent experiments 
that monkeys include post-reward delays in their decisions when 
they are explicitly cued (Pearson et al, 2010; Blanchard et al., 
2013). 

In environments with time-dependent changes of reinforce- 
ment statistics, animals should have an appropriately sized 
past integration interval depending on the environment so as 
to appropriately estimate opportunity costs [e.g., integrating 
reward-history from the onset of winter would be highly mal- 
adaptive in order to evaluate the opportunity cost associated with 
a delay of an hour in the summer; also see Effects of Plasticity 
in the Past Integration Interval (Time) in Appendix]. In keep- 
ing with the expectation that animals can adapt past integration 
intervals to their environment, it has been shown that humans 
can adaptively assign different weights to previous decision out- 
comes based on the environment (Behrens et al, 2007; Rushworth 
and Behrens, 2008). As Equations (3) and (4) show (Figure 3A), 
changes in Time would correspondingly affect the steepness of dis- 
counting. This novel prediction has two major implications for 
behavior: (1) the discounting steepness of an individual need not 
be a constant, as has sometimes been implied in prior literature 
(Frederick et al, 2002); (2) the longer the past integration inter- 
val, the higher the tolerance to delays when considering future 
rewards. In accordance with the former prediction, several recent 
reviews have suggested that discounting rates are variable within 
and across individuals (Loewenstein and Prelec, 1992; Frederick 
et al, 2002; Schweighofer et al, 2006; Luhmann et al, 2008; 
Van den Bos and McClure, 2013). The latter prediction states 
that impulsivity (Evenden, 1999), as characterized by abnormally 
steep discounting, could be the result of abnormally short win- 
dows of past reward rate integration. This may explain the obser- 
vation that discounting becomes less steep as individuals develop 
in age (Peters and Btichel, 201 1), should the longevity of memo- 
ries increase over development. Past integration intervals could 
also be related to and bounded by the span of working mem- 
ory. In fact, recent studies have shown that working memory and 
temporal discounting are correlated within subjects (Shamosh 
et al., 2008; Bickel et al., 2011) and also that improving work- 
ing memory capacity decreases the steepness of discounting in 
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Stimulant addicts (Bickel et al., 2011). Further, Equation (6) states 
that changes in Time would lead to corresponding changes in sub- 
jective representations of time. Hence, we predict that perceived 
durations maybe linked to experienced reward environments, i.e., 
"time flies when you're having fun." 

It is important to point out that the TIMERR algorithm for 
decision-making only depends on the calculation of the expected 
reward rate, as shown in Figure 2B. While this algorithm is 
mathematically equivalent to picking the option with the highest 
subjective value Equation (3), the discounting of delayed rewards 
results purely from the effect of those delays on the expected 
reward rate. Hence, as has been previously proposed (Pearson 
et al, 2010; Blanchard et al., 2013), we do not think of the dis- 
counting steepness as a psychological constant of an individual. 
Instead, we posit that apparent discounting functions are the con- 
sequence of maximizing temporally-constrained expected reward 
rates, and that abnormalities in temporal discounting result from 
abnormal adaptations of Time. 

Reward magnitudes and delays have been shown to be rep- 
resented by neuromodulatory and cortical systems (Piatt and 
Glimcher, 1999; Shuler and Bear, 2006; Kobayashi and Schultz, 
2008), while neurons integrating cost and benefit to represent 
subjective values have also been observed (Kalenscher et al., 2005; 
Kennerley et al, 2006). Recent reward rate estimation (flest) has 
been proposed to be embodied by dopamine levels over long 
time-scales (Niv et al., 2007). Interestingly, it has been shown that 
administration of dopaminergic agonists (antagonists) leads to 
underproduction (overproduction) (Matell et al., 2006) of time 
intervals, consistent with a relationship between recent reward 
rate estimation and subjective time representation as proposed 
here. Average values of foraging environment have also been 
shown to be represented in the anterior cingulate cortex (KoUing 
et al., 2012). In light of these experimental observations neurobi- 
ological models have previously proposed that decisions, similar 
to our theory, result from the net balance between values of the 
options currently under consideration and the environment as 
a whole (Kennerley et al., 2006; KoUing et al., 2012). However, 
these models do not propose that the effective interval (Time) over 
which average reward rates are calculated directly determines the 
steepness of temporal discounting. 

While there have been previous models that connect time per- 
ception to temporal decision making (Staddon and Cerutti, 2003; 
Takahashi, 2006; Balci et al, 2011; Ray and Bossaerts, 2011), 
TIMERR theory is the first unified theory of intertemporal choice 
and time perception to capture such a wide array of experi- 
mental observations including, but not limited to, hyperbolic 
discounting (Stephens and Krebs, 1986; Stephens and Anderson, 
2001; Frederick et al, 2002; Kalenscher and Pennartz, 2008), 
"Magnitude" (Myerson and Green, 1995; Frederick et al, 2002; 
Kalenscher and Pennartz, 2008) and "Sign" effects (Frederick 
et al., 2002; Kalenscher and Pennartz, 2008), differential treat- 
ment of losses (Frederick et al., 2002; Kalenscher and Pennartz, 
2008), as well as correlations between temporal discounting, 
time perception (Wittmann and Paulus, 2008), and timing 
errors (Gibbon et al, 1997; Matell and Meek, 2000; Buhusi and 
Meek, 2005; Lejeune and Wearden, 2006; Wittmann et al, 2007; 
Wittmann and Paulus, 2008) (see "Summary" for a full list). 



While the notion of opportunity cost long precedes TIMERR, 
TIMERR's unique contribution is in stating that the past inte- 
gration interval over which opportunity cost is estimated directly 
determines the steepness of temporal discounting and the non- 
linearity of time perception. This is the major falsifiable predic- 
tion of TIMERR. As a direct result, TIMERR theory suggests 
that the spectra of aberrant timing behavior seen in cogni- 
tive/behavioral disorders (Buhusi and Meek, 2005; Wittmann 
et al, 2007; Wittmann and Paulus, 2008) (Parkinson's disease, 
schizophrenia, and stimulant addiction) can be rationalized as 
a consequence of aberrant integration over experienced reward 
history. Hence, TIMERR theory has major implications for the 
study (see Implications for Intertemporal Choice in Appendix) 
of decision-making in time and time perception in normal and 
clinical populations. 

METHODS 

All simulations were run using MATLAB R2010a. 
SIMULATIONS FOR FIGURE 1 

Figure IB: Each of the four decision-making agents ran a total of 
100 trials. This was repeated 10 times to get the mean and stan- 
dard deviation. Every trial consisted of the presentation of two 
reinforcement-options randomly chosen from the three possible 
alternatives as shown in Figure lA. 

Figure IE: The following four possible reward-options were 
considered, expressed as (r, f): (0.1, 100), (0.0001, 2), (5, 2), 
(5, 150). The units are arbitrary. To create the reinforcement- 
environment, a Poisson-process was generated for the availability- 
times of each of the four options. These times were binned into 
bins of size 1 unit, such that each time bin could consist of zero 
to four reward-options. The rate of occurrence for each option 
was set equally to 0.2 events/unit of time. For the three pre- 
vious decision-making models, the parameters were tuned for 
maximum performance by trial and error. Forgoing an avail- 
able reward-option was not possible for these models since their 
subjective values are always greater than zero for rewards. 

SIMULATIONS FOR FIGURES 5, 6 

An accumulator model described by the following equation was 
used for simulations of a time reproduction task. 

dST (t) = ^ + a y/ST{t)dWt 

\ ^ ime / 

where Wt is a standard Wiener process and a is the magnitude of 
the noise, a was set to 10%. Without the noise term in the R.H.S, 
this equation is consistent with the subjective time expression 
shown in Equation (6) since integrating for ST(t) exactly yields 
Equation (6). This equation can also be rewritten to be in terms 
of ST(t) as below. 

dST (f) = (l - dt + a y/ST(t)dWt 

The above equation was integrated using the Euler-Maruyama 
method. In this method, Sr(f) is updated using the following 
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equation for a random walk 

ST{t + At) = ST{t} + (l - At 

V '^ime / 

+CT ysr(f)VAf N(o, 1) 

where N(0, 1) is the standard normal distribution. The step size 
for integration, Af, was set so that there were 1000 steps for 
every simulated duration in the time interval reproduction task 
(Figures 5, 6). 

Every trial in the time reproduction task consisted of two 
phases: a time measurement phase and a time production phase. 
During the time measurement phase, the accumulator inte- 
grates subjective time until the expiration of the sample duration 
(Figure 6A). The subjective time value at the end of the sample 
duration is stored in memory after the addition of a constant 
Gaussian noise as the threshold for time production, i.e.. 

Threshold (t) = ST (t) + c N(0, 1) 

During the time production phase, the accumulator integrates 
subjective time until the threshold is crossed for the first time. 
This moment of first crossing represents the action response 
indicating the end of the sample duration, i.e.. 

Reproduced interval = t : ST (t) > Threshold (f) 

For the simulations resulting in Figures 5C,D, 6, 0 = 0.1 and c = 
0.001. For Figure 5C, sample interval durations ranged between 
1 and 90 s over bins of 1 s. A total of 2000 trials were performed 
for each combination of sample duration and Time to calculate 
the median production interval as shown in Figures 5C,D. While 
calculating the moment of reproduction, the integration was car- 
ried out up to a maximum time equaling 10 times the sample 
duration. 
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APPENDIX 
RESULTS 

Extensions of TIMERR theory 

Alternative version (store reward rates evaluated upon the 
receipt of reward in memory). An alternative version of TIMERR 
theory could be appropriate for very simple life forms with lim- 
ited computational resources that are capable of intertemporal 
decision making (e.g., insects). Rather than representing both 
the magnitude and delay to rewards separately and making deci- 
sions based on real time calculations, upon the receipt of reward, 
such animals could store subjective value directly in memory. 
In such a case, the reward rate at the time of reward receipt 
would be calculated over Time + f and converted to subjec- 
tive value. The decision between reward options is then simply 
described as picking the option with the highest stored subjec- 
tive value. Mathematically, such a calculation is exactly equivalent 
to the calculation presented in Extensions of TIMERR Theory in 
Appendbc. 

While the advantage of this model is that it is computation- 
ally less expensive, the disadvantages for the model are that ( 1 ) 
subjective values in memory are not generalizable, i.e., the subjec- 
tive value in memory for an option will fundamentally depend on 
the reward environment in which it was presented; and (2) rep- 
resentations of the reward delays could be useful for anticipatory 
behaviors. 

Evaluation of risk. Until now, we have assumed that a delayed 
reward will be available for consumption, provided the animal 
waits the delay, i.e., there are no explicit risks in obtaining the 
reward. In many instances in nature, however, such an assump- 
tion is not true. If the animal could build a model of the risks 
involved in obtaining a delayed reward, it could do better by 
including such a model in its decision making. Given informa- 
tion about a delayed reward (r, t), if the animal could predict the 
expected reward available for consumption after having waited 
the delay [ER{r, f)], the subjective value of such a reward could be 
written as 



SV(r, t) 



ER (r, t) - Oestf 

1+7^ 

-' ime 



(Al) 



days to months. In such an environment with competition from 
other foragers, a forager could estimate how much a reward will 
decay in the time it takes it to travel to the food source. 

Suppose the forager sees a reward of magnitude r at time f = 0, 
the moment of decision. The aim of the forager is to calculate how 
much value will be left by the time it reaches the food source, and 
to use this estimate in its current decision. Let us denote the time 
taken by the forager to travel to the food source by f. 

We assume that the rate of decay of a reward in competi- 
tion is proportional to a power of its magnitude, implying that 
larger rewards are more sought-after in competition and hence, 
would decay at a faster rate. We denote the survival time of a typ- 
ical reward by fsur and consider that after time fsur, the reward 
is entirely consumed. If, as stated above, one assumes that fsur is 
inversely related to a power a of the magnitude of a reward at any 
time [r(f)], we can write that tju 
proportionality. 

Hence, the rate of change of a value with initial magnitude r, 
will be 



-yljyi where is a constant of 



dr{t) 
dt 



r(f) 



■{kritf)r(t) 



Solving this differential equation for r(t), 



r(t) 



(1 -t- kat^t) I a. 



Here we set r(o) = r. 

A forager could estimate the parameters k and a based on 
the density of competition and other properties of the environ- 
ment. In such a case, the subjective value of a delayed reward (r, 
t) should be calculated as 



SV{r, t) 



(1 -I- kaft) I a 

1+7^ 

^ ime 



(A2) 



The discounting function in this case is 



This is based on Equation (3). 

It is important to note that this equation can still be expressed 
in terms of subjective time as defined in the Main Text, viz. 



D{r, t) 



(1 +kar*^t) la 



1 -I- ^ 

^ ime 



(A3) 



SVir, t) 



ER{r, f) 
f 



1 + 



Generally speaking, building such risk models is difficult, 
especially since they are environment-specific. However, there 
could be statistical patterns in environments for which ani- 
mals have acquired corresponding representations over evolution. 
Specifically, decay of rewards arising from factors like natural 
decay (rotting, for instance) or due to competition from other 
foragers could have statistical patterns. During the course of travel 
to a food source, competition poses the strongest cause for decay 
since natural decay typically happens over a longer time-scale, viz. 



Such a discounting function can be thought of as a quasi- 
hyperbolic discounting function, and is a more general form than 
Equation (3) since fc = 0 returns Equation (3). 

Non-linearities in subjective value estimation. Animals do not 
perceive rewards linearly (e.g., 20 L of juice is not 100 times more 
valuable than 200 mL). Non-linear reward perception may reflect 
the non-linear utility of rewards: too little is often insufficient 
while too much is unnecessary. Further, the value of a reward 
depends on the internal state of an animal (e.g., 200 mL of juice 
is more valuable to a thirsty animal than a satiated animal). We 
address such non-linearities as applied to TIMERR theory here. 
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If the non-linearities and state-dependence of magnitude per- 
ception can be expressed by a function f(r, state), then this 
function can be incorporated into Equation (3) to give 



SV(r, t) 



f (r, state) - Oestf 

1+7^ 

ime 



(A4) 



The introduction of such state-dependence and non-linearities 
may account for the anomalous "preference for spread" 
(Frederick et al., 2002; Kalenscher and Pennartz, 2008) and 
"preference for improving sequences" (Frederick et al., 2002; 
Kalenscher and Pennartz, 2008) seen in human choice behavior. 

Expected reward rate gain during the wait. We have not yet 
considered the possibility that animals could expect to receive 
additional rewards during the wait to delayed rewards, i.e., while 
animals expect to lose an average reward rate of a^st during the 
wait, there could be a different reward rate that they might, never- 
theless, expect to gain. If we denote that this additional expected 
reward rate is a fraction / of flest; then we can state that the net 
expected loss of reward rate during the wait is (1 -/)flest- This fac- 
tor can also be added to expressions of subjective value calculated 
above in Equations (3), (A2), and (A4). Specifically, Equation (3) 
becomes 



SV (r, f) 



r - (l - /) flestf 



1 + 



(A5) 



Such a factor is especially important in understanding prior 
human experiments. In abstract questions like "$100 now or $150 
a month from now?" human subjects expect an additional reward 
rate during the month and are almost certainly not making deci- 
sions with the assumption that the only reward they can obtain 
during the month is $150. 

State-dependence of discounting steepness. In the basic version 
of TIMERR theory, the time window over which the algorithm 
aims to maximize reward rates is the past integration interval 
(Time) plus the time to a delayed reward. However, non-linearities 
in the relationship between reward rates and fitness levels [as 
discussed in Effects of Plasticity in the Past Integration Interval 
(^ime) in Appendix] could lead to state-dependent consumption 
requirements. For example, in a state of extreme hunger, it might 
be appropriate for the decision rule to apply a very short time 
scale of discounting so as to avoid dangerously long delays to 
food. However, integrating past reward rates over such extremely 
short timescales could compromise the reliability of the estimated 
reward rate. Hence, as a more general version of TIMERR theory, 
the window over which reward rate is maximized could incor- 
porate a scaled down value of the interval over which past reward 
rate is estimated, with the scaling factor governed by consumption 
requirements. If such a scaling factor is represented by 5(state), 
Equation (3) would become 



SV (r, f) 



flestf 



1 + 



(A6) 



Generalized expression for subjective value. Combining 
Equations (A2), (A4)-(A6), we can write a more general 
expression for the subjective value of a delayed reward, includ- 
ing a model of risk along with additional reward rates, state 
dependences, and non-linearities in the perception of reward 
magnitude 

/(r, state) , . 
1 (1 -/) flestf 

(l + to/ (r, state)" 
SV (r, t) = — (A7) 



Ti^eS (state) 



Equation (A7) is a more complete expression for the subjective 
value of delayed rewards. Such an expression could capture almost 
the entirety of experimental results, considering its inherent flex- 
ibility. However, it should be noted that even with as simple an 
expression as Equation (3), many observed experimental results 
can be explained. 

DISCUSSION 

Implications for intertemporal choice 

Consequences of the discounting function. We rewrite Equation 
(4) below followed by its implications for intertemporal choice 
in environments with positive and negative past reward rate 
estimates. 



D(r,f) 



SV (r, t) 1 



1 + 



TimeS(state) 



In an environment with positive flest, the following predictions 
can be made 

1. "Magnitude Effect" for gains: as noted in the Main Text, as r 
increases, the numerator increases in value, effectively mak- 
ing the discounting less steep (Figure 3C). This effect has been 
experimentally observed and has been referred to as the "mag- 
nitude" effect (Frederick et al., 2002; Kalenscher and Pennartz, 
2008). TIMERR theory makes a further prediction, however, 
that the size of the "magnitude" effect will depend on the size 
of flest and t. Specifically, as flest and f increase, so does the size 
of the effect. 

2. "Magnitude Effect" for losses/punishments: if r is negative 
(i.e., loss/punishment), the discounting function will become 
more steep as the magnitude of r increases (Figures 3D, 4). 
Hence, in a rewarding environment (flest > 0), the "magni- 
tude" effect for punishments is in the opposite direction as the 
"magnitude" effect for gains. 

3. "Sign Effect": gains are discounted more steeply than pun- 
ishments of equal magnitudes. A further prediction is that 
this effect will be larger for smaller reward magnitudes. This 
prediction has been proven experimentally (Loewenstein and 
Prelec, 1992; Frederick et al, 2002). 

4. Differential treatment of losses/punishments: As the "mag- 
nitude" of the punishment decreases below flest Time (f" > - 
flest rijne)> the discounting function becomes a monotonicaUy 
increasing function of delay. This means that the punish- 
ment would be preferred immediately when the magnitude of 
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punishment is below this value. Above this value, a delayed 
punishment would be preferred to an immediate punishment. 
This prediction has experimental support (Frederick et al., 
2002; Kalenscher and Pennartz, 2008). 
5. A reward of r delayed beyond t = r/flest will lead to a negative 
subjective value. Hence, given an option between pursuing or 
forgoing this reward, the animal would only pursue (forgo) the 
reward at shorter (longer) delays. 

When understanding the reversal of the "Magnitude Effect" for 
losses, it is important to keep in mind that as \r\-^ oo, both losses 
and gains approach the same asymptote. 



D (r, f ; 



oo) 



1 + 



t 

^ime 



Hence, as the magnitude of a loss increases, the size of 
the "Magnitude Effect" becomes lower and harder to detect 
(Figure 4). 

In an environment with negative flest (i-e-; net punishing envi- 
ronment), all the predictions listed above would reverse trends. 
Specificany, 

1. "Magnitude Effect" for gains: as r increases, the discounting 
becomes steeper 

2. "Magnitude Effect" for losses: as the magnitude of a pun- 
ishment increases, the discounting function becomes less 
steep. 

3. "Sign Effect": Punishments are discounted more steeply than 
gains of equal magnitudes. 

4. Differential treatment of gains: as the magnitude of the gain 
decreases below flest Time ( J" < -flest Time)) it would be pre- 
ferred at a delay. Beyond this magnitude, the gain would be 
preferred immediately. 

5. A punishment of magnitude r will be treated with positive 
subjective value if it is delayed beyond f = r/flest. 

Animals do not maximize long-term reward rates. In typical ani- 
mal intertemporal choice experiments, in order to ensure that 
different reward options do not lead to a marked difference 
in overall experiment duration, a post-reward delay is intro- 
duced for all options such that the net duration of each trial is 
constant. In such experiments, a global-reward-rate-maximizing 
agent should always choose the larger reward, irrespective of the 
cue-reward delay, since the net time spent per trial in collecting 
any reward equals the constant trial duration. However, a pre- 
ponderance of experimental evidence shows that animals deviate 
from such ideal behavior of maximizing reward rates over the 
entire session (Stephens and Anderson, 2001; Kalenscher and 
Pennartz, 2008; Stephens, 2008). Such experimental results are 
typically interpreted to signify that animals do not, in fact, act 
as reward- rate-maximizing agents (Stephens and Anderson, 2001; 
Kalenscher and Pennartz, 2008; Stephens, 2008). TIMERR the- 
ory proposes that even though animals are maximizing reward 
rates, albeit under constraints of experience, post-reward delays 
are not incorporated into their decision process due to limitations 



of associative learning (Kacelnik and Bateson, 1996). As a conse- 
quence, animal choice behavior in such laboratory tasks would 
appear not to maximize global reward rates. 

TIMERR theory, however, allows for the possibility that in a 
variant of standard laboratory tasks that makes a post-reward 
delay immediately precede another reward included in the choice 
behavior would result in animals not ignoring post-reward delays. 
Prior experiments evince this possibility (Stephens and Anderson, 
2001). Specifically, post-reward delays are included in the decision 
process by birds performing a patch leave-stay task that is eco- 
nomically equivalent to standard laboratory tasks on intertem- 
poral choice (Stephens and Anderson, 2001). Also, as mentioned 
in the main text, TIMERR theory also allows for the inclusion of 
these delays in tasks where they can be learned e.g., when they are 
explicitly cued (Pearson et al., 2010; Blanchard et al, 2013). 

Effects of plasticity in the past integration interval (Time) 

The most important implication of the TIMERR theory is that the 
steepness of discounting of future rewards will depend directly on 
the past integration interval, i.e., the longer you integrate over the 
past, the more tolerant you wiU be to delays, and vice-versa. In the 
above sections, the past integration interval (Time) was treated as 
a constant. However, the purpose of the past integration interval 
is to reliably estimate the baseline reward rate expected through 
the delay until a future reward. Further, since Time determines 
the temporal discounting steepness, it will also affect the rate at 
which animals obtain rewards in a given environment. Hence, 
depending on the reinforcement statistics of the environment, it 
would be appropriate for animals to adaptively integrate reward 
history over different temporal windows so as to maximize rates 
of reward. 

In this section, we qualitatively address the problem of opti- 
mizing Time- We consider that an optimal Time would satisfy four 
criteria: (1) obtain rewards at magnitudes and intervals that max- 
imize the fitness of an animal, which is accomplished partially 
through (2) reliable estimation of past reward rates leading to 
(3) appropriate estimations of opportunity cost for typical delays 
faced by the animal with (4) minimal computational/memory 
costs. 

Before considering the general optimization problem for 
Time) it is useful to consider an illustrative example. This 
example ignores the last three criteria listed above and only 
considers the impact of Time on the fitness of an animal. 
Consider a hypothetical animal that typically obtains rewards 
at a rate of 1 unit per hour. Suppose such an animal is pre- 
sented with a choice between (a) 2 units of reward available 
after an hour, and (b) 20 units of reward available after 15 h. 
The subjective values of options "a" and "b" are calculated 
below for four different values of T[^e, as per Equation (3). 

Subjective Subjective Chosen 
value of "a" value of "b" option 
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As is apparent, larger Time biases the choice toward option "fo." 
This is appropriate in order to maximize long-term reward rate 
since the long term reward rate is higher for option "fa," as shown 
below. 

Reward rate having chosen option "fo" = 20 units in 15 h = 
20/15 units/h. 

Reward rate having chosen option "a" = 2 units in 1 h + 14 
units in the remaining 14h = 16/15 units/h. 

However, if we presume that this animal evolved so as to 
require a minimum reward of 2 units within every 10 h in 
order to function in good health, choosing option "b" would 
be inappropriate. Hence, it is clear that for this hypotheti- 
cal animal. Time should be much lower than 10 h. In sum- 
mary, so as to meet consumption requirements, it is inap- 
propriate to integrate past reward rate history over very long 
times even if the animal has infinite computational/memory 
resources. Keeping in mind the above example and the four cri- 
teria listed for an optimal Time, we enumerate the following 
disadvantages for setting inappropriately large or inappropriately 
small Time. 

Integrating over inappropriately large Time has at least four 
disadvantages to the animal: (1) a very long Time is inappro- 
priate given consumption requirements of an animal, as illus- 
trated above; (2) the computational/memory costs involved in 
this integration are high; (3) integrating over large time scales 
in a dynamically changing environment could make the esti- 
mate of past reward rate inappropriate for the delay to reward 
(e.g., integrating over the winter and spring seasons as an esti- 
mate of baseline reward rate expected over a delay of an hour 
in the summer might prove very costly for foragers); (4) the 
longer the Time, the harder it is to update a^st in a dynamic 
environment. 

Integrating over inappropriately small Time, on the other hand, 
presents the following disadvantages: (1) estimate of baseline 
reward rate would be unreliable since integration must be carried 
out over a long enough time-scale so as to appreciate the station- 
ary variability in an environment; (2) estimate of baseline reward 
rate might be highly inappropriate for the future delay (e.g., inte- 
grating over the past 1 min might be very inappropriate when the 
delay to a future reward is a day); (3) the animal would more 
greatly deviate from global optimality [as is clear from Equation 
(3)]. 

In light of the above discussion, we argue that the following 
relationships should hold for Time- In each of these relationships, 
all factors other than the one considered are assumed constant. 

Rl. Time-dependent changes in environmental reinforcement 
statistics: if an environment is unstable, i.e., the reinforce- 
ment statistics of the environment are time- dependent, we 
predict that Time would be lower than the timescale of the 
dynamics of changes in environmental statistics. 

R2. Variability of estimated reward rate: if an environment is 
stable and has very low variability in the estimated reward 
rate it provides to an animal, integrating over a long Time 
would not provide a more accurate estimate of past reward 
rate than integrating over a short Time- Hence, in order to be 
better at adapting to potential changes in the environment 



and minimize computational/ memory costs, we predict 
that in a stable environment. Time will reduce (increase) 
as the variability in the estimated reward rate reduces 
(increases). 

R3. Mean of estimated reward rate: in a stable environment with 
higher average reward rates, the benefit of integrating over 
a long Time will be smaller when weighed against the com- 
putational/memory cost involved. As an extreme example, 
when the reward rate is infinity, the benefit of integrating 
over long windows is infinitesimal. This is because the ben- 
efit of integrating over a longer Ti^^ can be thought of as 
the net gain in average reward rate over that achieved when 
decisions are made with the lowest possible Tijne • If the 
increase in average reward rate is solely due to an increase 
in the mean (constant standard deviation) of reward mag- 
nitudes, the proportional benefit of integrating over a large 
Time reduces. If the increase in average reward rate is solely 
due to an increase in frequency of rewards, the integra- 
tion can be carried out over a lower time to maintain the 
estimation accuracy. Hence, we predict that, in general, as 
average reward rates increase (decrease). Time will decrease 
(increase). 

R4. Average delays to rewards: as the average delay between 
the moment of decision and receipt of rewards increases 
(decreases). Time should increase (decrease) correspond- 
ingly. This is because reward history calculated over a low 
Time might be inappropriate as an estimate of baseline 
reward rate for the delays until future reward. 

In human experiments, it is common to give abstract ques- 
tionnaires to study preference (e.g., "which do you prefer: $100 
now or $150 a month from now?"). In such tasks, setting Time 
to be of the order of seconds or minutes might be very inap- 
propriate to calculate a baseline expected reward rate over the 
month to a reward (R4 above). Hence, we predict that Time might 
increase so as to match the abstract delays to allow humans to 
discount less steeply as these delays increase. Similarly, when the 
choice involves delays of the order of seconds, integrating over 
hours might not be appropriate and therefore, the discounting 
steepness would be predicted to be higher in such experiments. 
Thus, in prior experimental results (Loewenstein and Prelec, 
1992; Frederick et al, 2002; Schweighofer et al, 2006; Luhmann 
et al., 2008), Time might have changed to reflect the delays 
queried. 

Calculation of the estimate of past reward rate (aesti 

It must be noted that even though the calculation of flest is 
performed over a time-scale of Time, yet unspecified is the par- 
ticular form of memory for past reward events. The simplest 
form of a memory function is one in which rewards that were 
received within a past duration of Ti^e are recollected perfectly 
while any reward that was received beyond this duration is com- 
pletely forgotten. A more realistic memory function will be such 
that a reward that was received will be remembered accurately 
with a probability depending on the time in the past at which 
it was received, with the dependence being a continuous and 
monotonically decreasing function. For such a function. Time 
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will be defined as twice the average recollected duration over 

the probability distribution of recollection. The factor of two is 
to ensure that in the simplest memory model presented above, 
the longest duration at which rewards are recollected (twice the 
average duration) is Time- 

If we define local updating as updating a^st based solely on 
the memory of the last reward (both magnitude and time elapsed 
since its receipt), the constraint of local updating when placed on 



such a general memory function necessitates it to be exponential 
in time. In this case, a^st is updated as: 

«est flest + ; upon receipt of reward 
flest flestexp(- ^V"^'' ); Otherwise 

where iiastreward is the time elapsed since the receipt of the last 
reward. 
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